[Stack 13/27] Vectorize participant info computation (3-15x speedup)#2437
[Stack 13/27] Vectorize participant info computation (3-15x speedup)#2437jucor wants to merge 6 commits intojc/regression-test-perffrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes Conversation._compute_participant_info_optimized by replacing the per-participant Python loop with vectorized NumPy computations, and adds tests + a benchmark script to validate and measure the speedup.
Changes:
- Vectorizes participant vote counting and per-group correlation computation in
_compute_participant_info_optimized. - Adds a comprehensive unit test suite for participant info behavior and regression against golden snapshots.
- Adds an A/B benchmark script comparing the old loop implementation vs the new vectorized implementation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
delphi/polismath/conversation/conversation.py |
Replaces per-participant loop with vectorized vote counting and bulk per-group Pearson correlations. |
delphi/tests/test_participant_info.py |
Adds focused unit tests for vote counts, correlations, edge cases, and golden snapshot regression. |
delphi/scripts/benchmark_participant_info.py |
Adds a script to benchmark old vs new implementations on real datasets and sanity-check equivalence. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Create group clusters (vectorized method requires 'center' key) | ||
| group_clusters = [ | ||
| {'id': 1, 'members': ['p1', 'p2']}, | ||
| {'id': 2, 'members': ['p3', 'p4']} | ||
| {'id': 1, 'members': ['p1', 'p2'], 'center': [0.0]}, | ||
| {'id': 2, 'members': ['p3', 'p4'], 'center': [0.0]} |
|
|
||
| from polismath.pca_kmeans_rep.repness import conv_repness, participant_stats | ||
| from polismath.pca_kmeans_rep.repness import conv_repness | ||
| from polismath.conversation.conversation import Conversation |
| # Create group clusters (vectorized method requires 'center' key) | ||
| group_clusters = [ | ||
| {'id': 1, 'members': ['p1', 'p2']}, | ||
| {'id': 2, 'members': ['p3', 'p4']} | ||
| {'id': 1, 'members': ['p1', 'p2'], 'center': [0.0]}, | ||
| {'id': 2, 'members': ['p3', 'p4'], 'center': [0.0]} |
013624a to
19d4a5b
Compare
b839591 to
3048f93
Compare
19d4a5b to
4a57515
Compare
3048f93 to
e883338
Compare
4a57515 to
967ffec
Compare
e883338 to
9dee5d9
Compare
967ffec to
4b803ca
Compare
9dee5d9 to
0d4cd71
Compare
4b803ca to
f4252a0
Compare
0d4cd71 to
fa345cd
Compare
f4252a0 to
3f09ab0
Compare
478fa73 to
ce0c874
Compare
923d07c to
5c9399b
Compare
ce0c874 to
da4b4de
Compare
5c9399b to
7c6b83d
Compare
da4b4de to
a9b000e
Compare
7c6b83d to
50a17b4
Compare
a9b000e to
41dec2d
Compare
50a17b4 to
2ab1789
Compare
41dec2d to
b3d7506
Compare
2ab1789 to
67731ff
Compare
Delphi Coverage Report
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| g = golden_pi[pid_str] | ||
| # Find matching pid in computed (could be int or str) | ||
| c = computed_pi.get(int(pid_str), computed_pi.get(pid_str)) | ||
| assert c is not None, f"Missing participant {pid_str} in computed" |
There was a problem hiding this comment.
In test_participant_info_matches_golden, int(pid_str) will raise ValueError if participant IDs in the golden snapshot are non-numeric (possible for some datasets / local fixtures). Consider wrapping the int(...) cast in a try/except (or checking .isdigit()) and falling back to the string key to keep the test robust to non-integer IDs.
| votes_dict, metadata = prepare_votes_data(dataset_name) | ||
|
|
||
| # Run full pipeline | ||
| conv = Conversation(dataset_name) | ||
| conv = conv.update_votes(votes_dict, recompute=True) | ||
|
|
There was a problem hiding this comment.
This golden snapshot test runs a full Conversation.update_votes(..., recompute=True) pipeline per discovered dataset, which appears redundant with tests/test_regression.py::test_conversation_regression (it already computes after_full_recompute via to_dict() which includes participant_info). Consider reusing the regression comparer output / golden stage data, or marking this as a slow/optional test, to avoid adding another full pipeline run per dataset.
| # Create group clusters | ||
|
|
||
| # Create group clusters (vectorized method requires 'center' key) |
There was a problem hiding this comment.
The comment says the vectorized method requires a 'center' key, but _compute_participant_info_optimized() only reads id and members. Consider adjusting the comment to avoid implying 'center' is required (or explain that it's just matching the usual group_clusters schema).
| # Create group clusters (vectorized method requires 'center' key) | |
| # Create group clusters (include 'center' to match usual group_clusters schema) |
| vote_matrix = pd.DataFrame(vote_data, index=row_names, columns=col_names) | ||
|
|
||
| # Create group clusters | ||
| # Create group clusters (vectorized method requires 'center' key) |
There was a problem hiding this comment.
The comment says the vectorized method requires a 'center' key, but _compute_participant_info_optimized() does not reference 'center' (it only uses id and members). Consider updating the comment to avoid suggesting 'center' is required, or clarify it's only to match the typical group_clusters schema.
| # Create group clusters (vectorized method requires 'center' key) | |
| # Create group clusters (including optional 'center' key to match typical schema) |
67731ff to
afbd68a
Compare
b3d7506 to
5611792
Compare
… imports - Clarify Beta(2,2) comment to specify posterior mode (MAP) estimate - Import PSEUDO_COUNT from repness.py instead of hardcoding in simplified_repness_test.py - Move PSEUDO_COUNT import to module-level in test_repness_unit.py and test_old_format_repness.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Map plan's "PR N" labels to actual GitHub PR numbers and stack positions. Add non-discrepancy PRs table. Replace old naming convention with stack-managed [Stack N/M] prefix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the O(N×G×C) per-participant Python loop with bulk NumPy operations: matrix-wide vote counting and per-group Pearson correlation via matrix multiply. The assembly loop only indexes pre-computed arrays. Also adds 31 unit tests (test_participant_info.py) covering vote counts, group correlations, edge cases, and golden snapshot regression. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A/B benchmark comparing old per-participant loop vs new vectorized implementation. Measured 3-15x speedup depending on dataset size. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ype assert - Remove unused group_vote_matrices dict (comment 3) - Validate --runs >= 1 in benchmark script (comment 4) - Tighten correlation type check to exact `float` (comment 5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete the O(participants × groups × members) participant_stats() from repness.py (~130 lines) now that _compute_participant_info_optimized() on the Conversation class provides a 3-15x faster NumPy replacement. - Remove function from repness.py and __init__.py exports - Update all imports (conversation.py, run_analysis.py, 4 test files) - Rewrite 3 test methods to call the vectorized Conversation method - Add "remove dead code after replacement" principle to the plan Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5611792 to
6a96d48
Compare
afbd68a to
6138219
Compare
|
Superseded by spr-managed PR stack. See the new stack starting at #2508. |
Summary
_compute_participant_info_optimizedwith bulk NumPy operations: matrix-wide vote counting (np.sumover axis) and per-group Pearson correlation viaP @ gmatrix multiplyfloatinstead ofnumpy.float64scripts/benchmark_participant_info.py) that runs old vs new on the same dataBenchmark results
Measured on real datasets (5 runs, median), old per-participant loop vs new vectorized:
Speedup is higher on smaller datasets (loop overhead dominates) and lower on very large ones (matrix materialization dominates). Overall 3–15x depending on size.
Test plan
🤖 Generated with Claude Code